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1 Introduction 

The purpose of this supplement is to provide a more complete account of the mathematics 
underlying our analyses in the main text. In particular, the order complex and clique 
topology are described more precisely here. The order complex of a matrix is analogous 
to its Jordan Form, in that it captures features that are invariant under a certain type 
of matrix transformation. Likewise, the clique topology of a matrix is analogous to its 
eigenvalue spectrum, in that it provides a set of invariants that can be used to detect 
structure. While the Jordan Form and eigenvalue spectrum are invariant under linear 
change of variables, the order complex and clique topology are invariant under monotonic 
transformations of the matrix entries. 
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Seeking quantities that are invariant under linear coordinate transformations is nat¬ 
ural in physical applications, where measurements are often performed with respect to 
an arbitrary basis, such as the choice of x, y and z directions in physical space. In con¬ 
trast, measurements in biological settings are often obtained as nonlinear (but monotonic) 
transformations of the underlying “real” variables, while the choice of basis is meaningful 
and fixed. For example, basis elements might represent particular neurons or genes, and 
measurements (matrix elements) could consist of pairwise correlations in neural activity, 
or the co-expression of pairs of genes. Unlike change of basis, these transformations are 
of the form 

La = f(Mij), 

where / is a nonlinear, but monotonically increasing function that is applied to each entry 
of M. The Jordan Form of a matrix, and its spectrum, may be badly distorted by such 
transformations; it also discards basis information which may be meaningful and should 
be preserved. 

Given a symmetric, N x N matrix that reflects correlations or similarities between N 
entities (such as neurons, imaging voxels, etc.), we have two basic questions: 

Ql. Is the matrix a monotonic transformation of a random or geometric])] matrix? 

Q2. Can we distinguish between these two possibilities, without knowing /? 

Perhaps surprisingly, information sufficient to answer these questions is contained in the 
ordering of matrix entries, and is encoded in its order complex , to be described in the next 
section. To extract the relevant features, we compute certain topological invariants of the 
order complex, which we refer to as the clique topology of the matrix. The motivation for 
this choice stems from recent mathematical results by M. Kahle |Kah09] . describing the 
clique topology of random symmetric matrices asymptotically (for large iV); and our own 
computational results, showing that random and “generic” Euclidean distance matrices 
can be readily distinguished using clique topology for N ~ 100. 

We have made an effort to keep these explanations self-contained, but details of how 
certain computations are performed have been left to the references for the sake of brevity. 
Standard material from algebraic topology [H al 02 is described in a minimal fashion, with 
an emphasis on homology of clique complexes. The reader is expected to be familiar with 
linear algebra. 

Comparison to prior applications in biology 

Topological data analysis has previously been used in biological applications to identify 
individual persistent cycles that may have meaningful interpretation [ CI08I. : SMI + 08a 
INLC11[ IDMFC12} ICCR13allCGYW14j . In contrast, our approach relies on the statistical 

1 Recall from the main text that a geometric matrix refers to a matrix of (negative) Euclidean distances 
among random points in . 
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properties of cycles, as captured by Betti curves, in order to detect geometric structure (or 
randomness) in symmetric matrices. In particular, the relevant space from which the data 
points are sampled may not possess any meaningful persistent cycles, as in the square box 
environment covered by place fields. The background Euclidean geometry, however, has 
a strong effect on the statistics of cycles, enabling detection of geometric structure and 
providing a sharp contrast to Betti curves of random matrices with i.i.d. entries. 

2 The order complex 

Recall that a function / : R —> M. is said to be monotonically increasing if f{x) > f(y ) 
whenever x > y. Let / : R — y R be a monotonically increasing function. For any 
real-valued matrix M, we define the matrix f ■ M by 

(f-M) ij = f(M ij ). 

Note that this action preserves the ordering of matrix entries. That is, if L — f ■ M, then 
all pairs of off-diagonal entries, (i,j) and (k,£), satisfy: 

Lij < Lki ^ Mij < Mm. 

Equivalence classes of matrices can thus be represented by integer-valued matrices that 
record the ordering of off-diagonal entries (and carry no information on the diagonal). 
Figure 1 shows three matrix orderings for N — 5. For a given symmetric matrix M, we 

d>1 d>2 d>3 


Figure 1: Three matrix orderings, reproduced from Figure 2a in the main text. 

denote the representative matrix ordering by M, where 

Mij — \{(k,£) | 0 < k < £ < N and My < } 

simply counts the number of upper-triangular entries of M that are smaller than M t j for 
i ^ j, while the diagonal entries of M are left undefined (| • | denotes the size of the set). 
If is the smallest off-diagonal entry, then M %3 = 0; if My is the largest matrix entry, 
and all upper-triangular entries are distinct, then M t j = (")) — 1. With this notation, we 
have: 
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Lemma 2.1. L = M if and only if there exists a monotonically increasing function f 
such that L = f ■ M. 

Proof. (<S=) is obvious, since the action of / preserves the ordering of matrix entries. (=>) 
One can construct / : M —> M by setting f{M i j) = L^ for each off-diagonal entry, and 
interpolating monotonically (e.g., linearly). Since we assume L = M, this function is 
monotonically increasing and well-defined. □ 

In order to analyze the information present in the ordering of entries for an N x N 
symmetric matrix, it is useful to represent it as a sequence of nested simple graphs. Recall 
that a simple graph G is a pair ([N], E ), where [N] = {1,2,..., N} is the ordered set of 
vertices, and E is the set of edges. Each edge is undirected and connects a unique pair 
of distinct vertices (no self-loops). We will use the notation (ij ) G G to indicate that the 
edge corresponding to vertices i , j is in the graph. 



b ord(M)=ord(L) 


1 

; 
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i 
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p=0.008 p=0.1 p=0.25 p=0.45 



Figure 2: Selected graphs in an order complex, adapted from Figure 1 in the main text. 


Definition 2.2. Let M be a real symmetric matrix with matrix ordering M , and let 
p = maxj<j My. The order complex of M , denoted ord(M), is the sequence of graphs 


Go c G\ c ■ ■ ■ c G p . |_i, 
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such that 


(ij) G G r Mij > p — r for each r = 0,..., p + 1. 


Note that G 0 has no edges, G\ contains only the edge (ij) corresponding to the largest 
off-diagonal entry of M, and subsequent graphs are each obtained from the previous one 
by adding an additional edge for each next-largest entry until we reach the complete 
graph, G p+ i. A portion of an order complex is illustrated in Figure 2. It is clear from the 
definition that: 

ord(L) = ord(M) L — M. 

the order complex ord(M) captures all features of M that are 


Because of Lemma 2.1 


preserved under the action of monotonically increasing functions. 


3 Clique topology 


We are now ready to introduce clique topology , a tool for extracting invariant features of 
a matrix from the ordering of matrix entries. We begin by describing the clique topology 
of a single graph G, by which we simply mean the homology of its clique complex. 


Hi(X(G), k), 


where k is a held (more on the held in section 3.2) 
in section 


section 3.3 


The clique complex, X(G), is defined 


3.1 while the simplicial homology groups, Hj(X(G), k), will be defined in 
We refer to these invariants as clique topology in order to indicate that we 


are measuring topological features of the organization of cliques in the graph, rather than 
the usual topology of the graph. 

We summarize the information present in clique topology via a set of Betti numbers , 
j3i(X(G)), which are the ranks of the corresponding homology groups: 


Pi{X(G)) d = rank H i (X(G) 1 k). 

The clique topology of a symmetric matrix M, with order complex G 0 C G\ C • • • C G p+ i, 
is rehected in the sequences of Betti numbers /3i(X(G r )), computed for various dimensions 


i — 0,1,2,, and for each graph G r in ord (M) (see section 3.4). 

The reader familiar with homology of simplicial complexes, including clique complexes, 
should feel free to skip the next few sections and proceed directly to section [T4 where 
we define Betti curves. 


3.1 The clique complex of a graph 

Recall that a clique in a graph G is an all-to-all connected collection of vertices in G. 
An m-clique is a clique consisting of m vertices. Note that if a is a clique of G, then all 
subsets of a are also cliques. 
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Definition 3.1. Let G be a graph with N vertices. The clique complex of G, denoted 
X(G), is the set of all cliques of G: 

X{G) = {a C [N] | cr is a clique of G}. 


We write X rn (G) for the set of (m + l)-cliques of G. 


The shift in index reflects the “dimension” of a clique, when the clique complex is rep¬ 
resented geometrically. If we think of the vertices of the graph G as embedded generically 
in a high-dimensional space, each clique represents the simplex given by the convex hull of 
its vertices. For example, the convex hull of two vertices is a 1-dimensional edge, for three 
vertices we obtain a 2-dimensional triangle, and four vertices yields a 3-dimensional tetra¬ 
hedron. Thus, cliques in X m (G) consist of m + 1 vertices, but represent m-dimensional 
simplices. 

The boundary of a clique a C G is the collection of subcliques r C cr which have one 
fewer vertex. This corresponds to the set of lower-dimensional simplices that comprise 
the boundary of the simplex defined by a (Figure 3b). 


The homology of a clique complex X(G), to be defined in section 3.3, is a measurement 


of relationships among the cliques in G. Intuitively, homology counts cycles in the clique 
complex, a higher-dimensional generalization of the notion of cycles in a graph (Figure 
3a). A collection of cliques forms a cycle if their boundaries overlap so as to “cancel” 
one another (Figure 3b). We also wish avoid double-counting cycles which are in our 
geometric sense equivalent. In particular, two cycles are considered equivalent if one can 
be deformed into the other without leaving the clique complex (Figure 3c). Alternatively, 
if we can “combine” two cycles to form a third (Figure 3d), we should not detect their 
concatenation as a new independent cycle. 



Figure 3: Illustrations of homology ideas. 
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3.2 Chains and boundaries 


To make the above notion of “cancellation” of boundaries precise (and computable), one 
introduces linear combinations of cliques, called chains. Given a set of cliques <7i,..., oy G 
A"(G), one can form a vector space consisting of formal linear combinations of cliques with 
coefficients in a field k: 

t 

’ 2 * * S ^a i c ai , where aj G k, 

i= 1 

and c ai denotes the basis element corresponding to the clique a % . To define chain groups, 
one considers linear combinations of cliques of the same size. Recall that X m {G) denotes 
the set of (m + l)-cliques of G. 

Definition 3.2. The m-th chain group of A '(G), with coefficients in k, is the k-vector 
space: 


C m (X (G); k) '= i aiC ai \ Oi G X m (G) and a, G k for each i — 1 ,...,£ 
l i= i 

As we will always be working with coefficients in an aribitrary field k0 we will omit it 
from the notation and write C m (X(G)) instead of G m (X(G); k).Note that G 0 (A"(G)) con¬ 
sists of formal linear combinations of 1-cliques (vertices), Gi(A"(G)) of 2-cliques (edges), 
C 2 (A"(G)) of 3-cliques (triangles), and so on. 

The boundaries of cliques can also be described algebraically, allowing this notion to 
be extended to chains. If a = is an m-clique of G, we use the notation 

hr = c VoVl ... Vm , 

where Vq < V\ < ... < v m (recall that each Vi G [A]). Consistent ordering is important 
because it affects the signs in the boundary map. Given a sequence of vertices VqVi ... v m , 
we denote by v 0 v\... ig ... v m the sequence obtained by omitting the element Vi. Note 
that for each a G X m (G), the element c a = c VQVx ,„ Vrn is a basis element of the vector space 

C m (X{G)). 



2 For readers uncomfortable with the notion of a general field k, it is relatively harmless to substitute 
K or Q for k for the remainder of the discussion. One should keep in mind, however, that the actual 

computations typically take place with k a finite field, Z/pZ. This can have an effect on the result: in 

such a held, one can add a boundary to itself a finite number of times and get zero, creating “extra” 

cycles - called torsion cycles - that would not be present over R. These extra cycles measure aspects of 
the clique complex that are not relevant to our purposes. In our software we have chosen the Held to be 
Z/2Z, but this choice is somewhat arbitrary and not important. So long as all computations are done 
using the same Held, comparing the resulting homology groups across different graphs is entirely valid. 
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Definition 3.3. The boundary map d m : C m (X(G )) —> C m -i(X(G)), for m > 0, is given 


on basis elements c. 


VOVl ...Vm by 

dm{Cvov 1 ...v rn ) = 


m 


i =0 


(- 1 )^ 


VQV! ...Vi...Vm’> 


The 


and is extended via linearity to general chains; i.e. d m (J2j ajC aj ) = J2 a j^m{c 0 
map do is defined to be the zero map. 

Recall that in the geometric picture, an (m+l)-clique corresponds to a m-dimensional 
simplex, and the boundary of this simplex is the set of m-cliques comprising its (m — 1 )- 
dimensional facets - that is, all subcliques on one fewer vertex. We have thus defined 
the boundary of a chain in G m (X(G)) in a fashion consistent with our geometric under¬ 
standing: as a formal sum of chains in C' m _ 1 (X(G)), corresponding to simplices that are 
one dimension lower (see Figure 3b). Note that signs are assigned to the elements of this 
formal sum to indicate the orientation of cliques, which will be critical for obtaining the 


desired “cancellation” of boundaries (see Remark 3.6 for details). 

Example 3.4. Suppose a, r G X 2 (G) are cliques on vertices {1,2,3} and {1,2,4} respec¬ 
tively. The boundary of the 2-chain c a — c T G C 2 (X(G)) is 

^ 2(^123 C 124 ) = • 92 (^ 123 ) ^ 2 (^ 124 ) 

— ( c 23 — c 13 + C12) — (C24 — C14 + C12) 

= C23 — C13 — C24 + C14. 

The cancellation of C 12 reflects the fact that the clique {1,2} appears twice in the boundary 
of c a — c T , with opposite orientation. Note also that applying d\ to the resulting 1-chain 
yields 

*9l (^2(^123 C124)) = <9l(c23 — c 13 — c 24 + C14) 

= (c 3 - c 2 ) - (c 3 - Cl) - (c 4 - c 2 ) + (c 4 - Cl) 

= 0. 

In fact, it is straightforward to check from the definition that the composition of two 
subsequent boundary maps always yields 0. I 11 other words, 

Lemma 3.5. For any m > 0, d m o d m+ i = 0. 

Remark 3.6. The orientation of cliques can be positive or negative. The vertices of a 
clique c, 


VQV\...Vm 


G X m (G) have a canonical ordering induced by the usual ordering of the 
vertices [N] of G. We define the canonical ordering to have positive orientation for each 
clique. Any other ordering can be obtained as a permutation of the canonical ordering, 
and the resulting ordering is positive or negative according to the sign of the permutation. 
For example, C 124 has a positive orientation, while C 214 is negatively oriented. When we 
compute the boundary of a clique c a in Definition 3.3, the signs arise as a result of the 
induced orientation on the boundary cliques. The result of taking all cliques on the 


boundary is the signed sum we obtain in Definition 3.3 





3.3 Homology of a clique complex 

For a given graph G, the chain groups C m (X(G)) can be strung together to form a chain 
complex: 


O d J^Jc k (X(G)) 


d k 


■C k -i(X(G)) 


dk -1 


d 2 


Ci(X(G)) 


di 


■C o (X(G))^=l0, 


The zeroes at either end of the complex represent the zero-dimensional k-vector space, 
and the maps at each end are necessarily the zero map. 

If a chain is in the kernel of the boundary map, it is because the (oriented) boundaries 
of its constituent cliques cancel one another. This is precisely the desired notion of a 
cycle, so the set of m-cycles is exactly ker(<9 m ); in particular, 1-cycles correspond to the 
usual notion of cycles in a graph. Note also that any chain in C m (X(G )) which forms the 
boundary of a clique in X m+ i (G) is itself a cycle, so its own boundary should be zero. 
This is reflected in the fact that d m o d m+ \ = 0 (Lemma 3.5). In particular, 


im d m+ i C ker d n 


When we are counting cycles for homology, we do not want to consider those which 
arise as boundaries of chains, as these are “filled in.” For example, the two clique complexes 
in Figure 4 should have the same number of homology 1-cycles. In Figure 4b, we do not 
wish to count the chain C 23 + C 35 — C 25 G C\ (X ( G )) as a 1-cycle because it is the boundary 
of a clique, C 235 G C 2 (X(G)). 



Figure 4: Two clique complexes for graphs on 4 and 5 vertices. 


In order to eliminate cycles that are boundaries of higher-dimensional cliques, one com¬ 
putes quotient vector spaces, ker(<9 m )/im(<9 m+ i). 


Definition 3.7. The m-th homology group of A"(G) with coefficients in k is the quotient 
space 


H m (X(G); k) = 


ker(d m ) 
im(<9 m+ i)' 


As with chain groups, we will omit the field from our notation and write simply H m (X(G)). 
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Observe that the zeroth homology group is special: since do = 0, its kernel is always 
C 0 (X(G)). The quotient ker(<9 0 )/im(<9i) thus identifies vertices which are connected to 
one another, so that H 0 (X(G)) is a vector space whose basis can be chosen to correspond 
to the connected components of G. 

Example 3.8. Let G be the graph on four vertices in Figure 4a. The kernel of the boundary 
map di : Ci(X(G)) —» Cq(X(G)) is the one-dimensional space spanned by a = c i 2 + C 23 + 
C34 — Ci 4 . Indeed, d\(a) = (c 2 — Ci) + (c 3 — c 2 ) + (c 4 — c 3 ) — (c 4 — ci) = 0. Since there are no 
cliques of size greater than 2, C 2 (X(G)) = 0 and hence d 2 = 0. ft follows that H 1 (X(G)) 
is precisely the one-dimensional vector space spanned by a. Furthermore, since Ci(X(G )) 
has dimension 4 and ker<9i has dimension 1, it follows that im di has dimension 3. We 
can thus deduce that Ho(X(G)) is also one-dimensional, consistent with the fact that G 
has just one connected component. 

Next, consider the graph G' on five vertices in Figure 4b. This graph has been obtained 
from G by “attaching” the clique {2, 3, 5}. The kernel of d\ is now 2-dimensional, and 
is spanned by both a and a new cycle, r = c 23 + C35 — c 2 5 . However, r E im<9 2 , so we 
find that Hi(X(G')) continues to be one-dimensional, consistent with our intuition that 
G and G' both have just one cycle that has not been “filled in” by cliques. 



Figure 5: Cross-polytopes generate the minimal clique complexes which produce homology 
in each dimension. Adapted from Figure 1 of the main text. 


Example 3.9. The smallest example of a graph G m whose clique complex has non-trivial 
m-th homology group is the 1-skeleton of the (m + l)-dimensional cross-polytope (Figure 
5). Such a graph can be built inductively starting from the graph Go (Figure 5a), having 
just two vertices and no edges. To obtain G\ from Go, we attach two new vertices and 
include all edges between the new vertices and the vertices of Go (Figure 5b). More 
generally, to obtain G* from G' ?; _ 1 we attach two new vertices and all edges between 
these new vertices and those of G*_ 1 . Thus, we obtain G 2 (Figure 5c) and G 3 (Figure 
5d), which give minimal examples of graphs whose clique complexes have a non-trivial 
homology 2-cycle and 3-cycle, respectively. 

A useful characterization of the clique topology of a graph is obtained by simply 
tracking the dimensions of the homology groups. This is done via the so-called Betti 
numbers. 
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Definition 3.10. The m-th Betti number of X(G), denoted /3 m (X(G)), is the rank of 
H m (X(G)] k) as a k-vector space. 

While this information discards the identities of individual cycles, it is well-suited to 
statistical methods as it reduces the clique topology of a graph to a sequence of integers. 

3.4 Clique topology across the order complex 

We now turn our attention to the clique topology of all graphs in the order complex at 
once. For a matrix M, the Betti numbers of the graphs in ord(M) are collected as follows. 

Definition 3.11. Let M be a real symmetric matrix and ord(M) = (Go C G\ C G 2 C 
• • • C G p+ 1 ) its order complex, where p = max J<:j M iy The m-th Betti curve of M is the 
sequence of numbers {(3 m (p r )} where p r is the edge density of the graph G r , and 

/3 m (p r ) = f rank H m (X(G r )). 

As the matrix M will be clear from context, we omit it from the notation. 

While each Betti curve is a discrete sequence, we can think of it as being a piecewise 
constant function. To simplify comparison, we consider as a summary statistic the integral 
of the entire Betti curve. We call this the m-th total Betti number of the matrix M, given 
by 

P+ 1 nl 

Pm(M) = 'Y^Pm{Pr)kpr = / /3 m (p) 

r=l J ° 

where A p r is the change in edge density between G r and G r _ij^] Typically, A p r = l/(^), 
which is the change in density after adding a single edge. As we will see, the f3 m alone can 
distinguish between a random symmetric matrix, drawn from a distribution with i.i.d. 
entries, and a geometric matrix, which arises from distances between a set of randomly- 
distributed points in Euclidean space. Thus, we can use the total Betti number to test 
the hypotheses that a matrix is random or geometric. 

4 Clique topology of random and geometric matrices 

In order to interpret the results of computing clique topology for matrices of interest, we 
need suitable null models for comparison. This brings us back to our motivating questions 
Q1 and Q2 from section]!} Can we use clique topology to reject the hypothesis that a given 
matrix is random or geometric? This will be possible if matrices in these categories have 

3 This measurement, /3 m (M), also appears as the first element in the basis for the ring of algebraic 
functions on the collection of all persistence structures described in |ACC13 j. 
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stereotyped Betti curves. In this case, it can be shown that a matrix with a substantially 
different Betti curve is unlikely to have come from the given null model distribution, and 
a p- value can be assigned to quantify the significance. 

Because clique topology depends only on ord(M), it suffices to describe the distribu¬ 
tions of order complexes we obtain for random and geometric matrices. In both families, 
the details of the Betti curves change with iV ; however, we find that their large-scale 
features are robust once N > 50. This means Betti curves can indeed be used to reject 
these models. 

The distribution of random order complexes arises by sampling a matrix ordering M 
from the uniform distribution on all such orderings. For N x N symmetric matrices with 
distinct entries this can be achieved by sampling permutations of {0 ,..., (^) —1} 

uniformly at random. Equivalently, the matrix can be chosen with i.i.d. entries drawn 
from any continuous distribution, or by shuffling the elements of a given matrix with 
distinct off-diagonal entries. Thus, in a graph G p of ord(M), each edge has independent 
probability p of appearing. In other words, the graphs in the order complex are a nested 
family of Erdos-Renyi random graphs. The clique topology of such complexes is relatively 
well understood from a theoretical perspective [Kah09 ], with highly stereotyped, unimodal 
Betti curves as illustrated in Figure 6a. 


3 random/shuffled b geometric 



Figure 6: Betti curves for random and geometric matrices, (a) N = 100, and means for 
Pi(p) (yellow), /3 2 (p) (red), and (d^(p) (blue) are displayed with bold lines, while shading 
indicates 99.5% confidence intervals, (b) N=100, and average Betti curves are displayed 
for dimensions d = 10,50,100,1000,10000, in increasing order (i.e., higher curves corre¬ 
spond to larger dimensions). 

A geometric order complex is one arising from the negative distance matrix of a collec¬ 
tion of points embedded in some Euclidean space. We choose negative distance matrices so 
that the highest matrix values correspond to the nearest distances; this is consistent with 
the intuition that correlations should decrease with distance, as described in the main text. 
Sampling such a complex consists of sampling N i.i.d. points, {pi}, from some distribution 
on R d . The associated sequence of clique complexes, X(G 0 ) C X(G\) C ... C A"(G p+ i), 
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corresponding to a geometric order complex is also referred to as the Vietoris-Rips com¬ 
plex of the underlying points. 

These complexes have been heavily studied in cases where the points are presumably 
sampled from an underlying manifold [CZCG04 j. In our setting, however, we sample 
points from the uniform distribution on the unit cube in M. d for d < N. To our knowledge, 
the Betti curves of geometric order complexes are largely unstudied. Our numerical 
experiments show that they are highly stereotyped (Figure 6b), irrespective of d for a 
large range of dimensions^] Moreover, they are roughly an order of magnitude smaller 
at the peak than the Betti curves of random order complexes with matching N, and the 
peak values decrease rather than increase as we move between (3i(p) to d-iip) and dfp). 
The differences between the Betti curves of random and geometric matrices can also be 


understood through the lens of persistence lifetimes, which we will describe in section 5.2 


4.1 Dimension of geometric order complexes 

Any matrix ordering M appears with equal probability in the distribution of random 
symmetric matrices with i.i.d. entries. The consistency of the Betti curves in Figure 6a 
indicates that “most” of these matrix orderings have a similar organization of cliques. For 
geometric matrices, the possible matrix orderings are sampled in a highly non-uniform 
manner, leading to dramatically different Betti curves. Despite this, it is worth noting 
that any matrix ordering can in fact arise from a distance matrix. 

Definition 4.1. A set of points p\,... ,p^ 6 is called a geometric realization of the 
matrix ordering M if the distance matrix D,j = || p i — p 3 | has D = M. 

Note that for each collection of three or more points, the (higher) triangle inequalities 
implied by the metric impose strong constraints on M. This means that for most matrix 
orderings, the probability of sampling a point configuration in the unit cube that yields a 
geometric realization of M is vanishingly small. This is why geometric Betti curves are, on 
average, so different from those of random matrices. Nevertheless, geometric realizations 
do always exist, provided d > N — 1. 

Lemma 4.2. Every N x N matrix ordering M that has distinct off-diagonal entries 
possesses a geometric realization in (N — 1)-dimensional Euclidean space. Moreover, this 
realization can be chosen as 



4 We observed similar Betti curves to those in Figure 6b for values of d that were orders of magnitude 
larger than N. Nevertheless, there is some evidence to indicate that Betti curves will approach those of 
random order complexes as d — > oo. 
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for small enough e > 0, where M is any symmetric matrix with ordering M and zeroes 
on the diagonal, and is the standard orthonormal basis in R N . 

Proof. With the choice above, | \pi — Pj \ \ 2 = | \Pi | | 2 +1 \Pj \ | 2 — 2pi ■ Pj = 1 + sM^ + O[e 1 ). □ 

Despite this fact, when we constrain the dimension d of the Euclidean space we do 
find matrix orderings that cannot be geometrically realized at all. This was the basis for 
our examples in Figure 2a of the main text. 

4.2 Figure 2a examples from the main text 

ffere we prove that the d > 2 and d > 3 matrices (reproduced in Figure 1) cannot be 
geometrically realized in lower dimensions. 

To see why the d > 2 matrix cannot arise from an arrangement of points on a line, 
observe that the three smallest matrix entries are M 12 ,M 13 , and M 14 . This implies the 
three shortest distances in a corresponding point arrangement must all involve the point 
Pi, which is not possible for points on a line. 

To see why the d > 3 matrix cannot arise from an arrangement of points on a plane, 
notice that the six smallest matrix entries are M ia , for i = 1,2,3 and a = 4, 5. This 
means the six smallest distances are those of the form \\pi — p a \\, for i = 1,2,3 and 
a = 4, 5. Without loss of generality we can assume || p. t — p Q \\ < 1, and all other distances 
are greater than one. Now suppose the points pi,... ,p 5 all lie in a plane. Then p 4 ,p 5 G 
D(pi) fi D(p 2 ) D D(p 3 ), where D(pi) is a disk of radius 1 centered at pi. Since none of the 
disk centers in contained in any of the other two disks, the largest distance between two 
points in the intersection D(pi) ft D(p 2 ) fl D(p 3 ) is less than one, and thus ||p 4 — p 5 \\ < 1, 
which is a contradiction. We conclude that the matrix cannot arise from points in the 
plane. We thank Anton Petrunin for this example. 


5 Computational aspects and persistence 


Each graph in an order complex, G 0 C G\ C • • • C G p+4 , is a subgraph of its successor. 
Intuitively, this means that the clique topology of any G r is closely related to the clique 
topology of the previous graph, G>-i- Exploiting this structure dramatically reduces 


the computational complexity of finding Betti curves (defined in section 3.4), and also 


provides us with finer matrix invariants in the form of persistence lifetimes of cycles. 
This is achieved via persistent homology , an approach that enables homology cycles to be 
tracked as we move from one graph in the order complex to the next. 


5.1 A brief history of persistent homology 

The mathematics underlying persistent homology has existed since the middle of the 
twentieth century, in the guise of Morse theory and spectral sequences for the homology 
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of filtered spaces. Its interpretation as a tool for data analysis, however, is a much 
more recent development. One can trace the origins of these applications to work on 
size theory in computer vision [ FL1111CFP0U IFM99] and alpha shapes in computational 
geometry [ Rob99 . Ede951 1EM94 ., ELZOOj . The use of persistent homology as a tool for 
the study of data sets relies on two fundamental and recent developments: computabilty 
and robustness. 

Computability arose from the persistence algorithm , developed first for subsets of 
three-dimensional complexes in [ ELZOO j and then extended to work with general simplicial 
complexes in [ ZC05 j. In addition to the algorithm, these papers introduced the notions of 
persistence diagrams and modules. Several software packages [TV.lAlli Morl4 , ITauIl j 
have been developed based on the persistence algorithm, and recent work using discrete 
Morse theory has led to further improvements in speed and memory efficiency |Nanl4j . 

Robustness to perturbations of the underlying simplicial complexes, on the other hand, 
was first explicitly shown through the bottleneck stability theorem of CSFl 107 . Further 
work has broadened this result by developing more complete theoretical tools for the com¬ 
parison of persistence structures, divorcing their stability from any underlying geometry 
|CCSG + 09l ICdSGQ12l ICdSQ12] . It is this interpretation of stability that most clearly 
applies to our study of order complexes. 

Although persistent homology has only recently emerged a tool for studying features 
of data, it has already found a broad range of applications [ OGR13b , lSMI + 08bl IGH10. 
IGTdSZn8j . 


5.2 Persistent homology of order complexes 

Here we present the basic ideas in persistent homology, restricted to the special case 
of computing clique topology for order complexes. This means we need to apply the 
persistence algorithm to filtered families of clique complexes, 

X(G 0 ) C X{G{) C • • • C X(G P ) C X(G P+1 ), 

where the graphs {G>} comprise the order complex of a symmetric matrix. In order 
to track homology cycles from one clique complex to the next, we need to understand 
how the natural inclusion maps on the graphs, t r : G r G r+ 1 , translate to maps on 
the corresponding cliques, chains, and homology groups, H m (X(G r )). This turns out 
to be straightforward, as there is an obvious extension to maps on clique complexes, 
i r : X(G r ) A"(GV + i), and these in turn can be extended linearly to maps between 

chain groups. 

Lemma 5.1. Consider the order complex G 0 C G 1 C ■ ■ ■ C G p+1 . The standard inclusion 
maps, i r : G r G r+ \, induce maps on homology (i r ) m : H m (X(G r )) —> H m (X(G r +0)- 

Using these maps, we can follow individual cycles and understand their evolution as 
we move from one graph to the next in the order complex. Of particular interest are the 
edge densities at which cycles appear and disappear (Figure 7). 
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Definition 5.2. Let u j G H m (X(G r )) be a non-zero cycle which is not in the image of 
i, and let s > r be the smallest integer such that o t s _ 2 o ... o i r {u) = 0. We say 
that oj is born at r and dies at s, and has persistence lifetime £(u>) = s — r. 



Figure 7: Illustration of persistence lifetime. The 1-cycle (yellow) appears at edge density 
pi and disappears at p$, so it’s lifetime is l — p^ — p\. 


For a given order complex, the distribution of persistence lifetimes provides a measure 


of matrix structure that is complementary to the Betti curves defined in section 3.4 


5.3 Persistence lifetimes of random and geometric order com¬ 
plexes 

Recall that there is a sharp qualitative difference in the Betti curves of random order 
complexes and those of geometric order complexes (Figure 6). These differences are also 
reflected in the distributions of their persistence lifetimes. While random complexes have 
relatively broad distributions (Figure 8a), the geometric complexes are heavily weighted 
toward shorter lifetimes (Figure 8b). The shapes of these distributions are a direct con¬ 
sequence of the order in which edges are added in the order complex. 

The qualitative differences in these distributions can be understood by thinking about 
dependencies in edge orderings in the order complex. Minimal cycles, represented by cross¬ 
polytopes (Figure 5), are known to constitute the large majority of cycles in random order 
complexes I KM 131 . and can thus be used to understand the shape of the distribution. Such 
a cycle’s lifetime is governed by the density at which the first additional edge appears, 
since the extra edge destroys the cycle by creating new cliques. Since the ordering of the 
edges is completely random, the lifetimes will be broadly distributed. In contrast, geo¬ 
metric order complexes are constrained by triangle inequalities (and higher-dimensional 
analogues); these produce dependencies in the edge ordering which imposes an upper limit 
on the lifetime of small cycles, like the cross-polytopesj^] Persistence lifetimes in geometric 
complexes are thus concentrated at short lifetimes. 

5 This is true statistically. It is, of course, possible for two distances to be close in absolute terms and 
still be separated by many edges in the order complex, but this is rare enough that the intuition about 
Betti curves still holds. 
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Figure 8: Persistence lifetimes for random and geometric matrices, (a) N = 100, and 
mean lifetime distributions for 1-cycles (yellow), 2-cycles (red), and 3-cycles (blue) are 
displayed with bold lines, while shading indicates 99.5% confidence intervals, (b) N=100, 
and average lifetime distributions are displayed for dimensions d = 10,50,100,1000,10000, 
in increasing order (i.e., higher curves correspond to larger dimensions). 


5.4 CliqueTop software 

To compute clique topology for symmetric matrices, we developed the CliqueTop Matlab 
package. This software is maintained by Chad Giusti, and is available on GitHub at 
https://github.com/nebneuron/clique-top, At the time of this writing, CliqueTop 
makes use of one other package: Perseus |Nanl4j . by Vidit Nanda. Perseus provides an 
implementation of the persistence algorithm, and is available at http: //www. sas .upenn. 
edu/~vnanda/perseus/index.html. Previous versions of CliqueTop also used the Cliquer 
software package |NQ02j . 
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